Multi-layer analysis of translation corpora: methodological issues and practical implications
نویسندگان
چکیده
The present paper discusses an application of multilingual, multi-layer corpus analysis from translation studies. The concrete context is the empirical testing of hypotheses about the specific properties of translations, such as explicitation, simplification, sanitization or normalization. While some of these assumed properties can be tested using some rather shallow measures that operate at the level of words (e.g., typetoken ratio or lexical density), others require the analysis of more abstract linguistic features at the level of the clause; also, it is necessary to refer to linguistic features across different strata of linguistic organization (semantics, grammar, discourse). We present some selected hypotheses about the nature of translations which have been tested on a bilingual EnglishGerman corpus. We discuss the particular corpus design adopted and the requirements on the techniques needed to extract information from this corpus and show how we use a combination of some standard corpus analysis techniques, such as string-based concordancing and part-of-speech tagging, and semi-automatic corpus analysis tools. We conclude with a summary and a list of issues for future work.
منابع مشابه
A new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملMultiple Tokenizations in a Diachronic Corpus
This paper deals with the construction of a maximally flexible corpus architecture for building and analyzing diachronic corpora. Historical data poses many challenges with regard to representation and analysis, and diachronic corpora are even more varied and unsystematic (Claridge, 2008). Since historical and diachronic corpora are so difficult and expensive to build, it is crucial that they b...
متن کاملTeam-Based Integrated Knowledge Translation for Enhancing Quality of Life in Long-term Care Settings: A Multi-method, Multi-sectoral Research Design
Multi-sectoral, interdisciplinary health research is increasingly recognizing integrated knowledge translation (iKT) as essential. It is characterized by diverse research partnerships, and iterative knowledge engagement, translation processes and democratized knowledge production. This paper reviews the methodological complexity and decision-making of a large iKT projec...
متن کاملQuerying Multi-Layer Annotation and Alignment in Translation Corpora
When dealing with linguistically annotated and aligned corpora current research concentrates mainly on the investigation of translation properties. However, annotated and aligned corpora can be useful for practical translation as well, since translators also work with parallel corpora. Translators typically use raw sentence aligned corpora stored in translation memories. In this paper we will s...
متن کاملMethodological cross-fertilization: empirical methodologies in (computational) linguistics and translation studies
Recent years have seen attempts at improving empirical methodologies in contrastive linguistics and in translation studies through interdisciplinary collaboration with multi-layer corpus architectures in computational linguistics. At the same time, explanatory background for empirical results is increasingly sought in more sophisticated models of language contact in typologically based contrast...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005